Linked Data Wrapper Curation: A Platform Perspective
نویسندگان
چکیده
Linked Data Wrappers (LDWs) turn Web APIs into RDF end-points, leveraging the LOD cloud with current data. This potential is frequently undervalued, regarding LDWs as mere by-products of larger endeavors, e.g. developing mashup applications. However, LDWs are mainly data-driven, not contaminated by application semantics, hence with an important potential for reuse. If LDWs could be decoupled from their breakout projects, this would increase the chances of LDWs becoming truly RDF end-points. But this vision is still under threat by LDW fragility upon API upgrades, and the risk of unmaintained LDWs. LDW curation might help. Similar to dataset curation, LDW curation aims to clean up datasets but, in this case, the dataset is implicitly described by the LDW definition, and “stains” are not limited to those related with the dataset quality but also include those related to the underlying API. This requires the existence of LDW platforms that leverage existing code repositories with additional functionalities that cater for LDW definition, deployment and curation. This paper contributes to this vision through: (1) identifying a set of requirements for LDW platforms; (2) instantiating these requirements for Yahoo’s YQL; and (3), validating the extent to which this approach facilitates LDW curation.
منابع مشابه
LinkedCT Live: Platform for Online Curation of Clinical Trials Data
The goal of the Linked Clinical Trials (LinkedCT) project is to transform the data published on ClinicalTrials.gov into a highquality knowledge base published as Linked Data on the Web. In this demonstration, we present the platform we have developed for both online curation of clinical trials data into linked data, and for rapid Web application development on top of this linked data. We also s...
متن کاملStudy of the foundation, models and issues of research data curation and management in scientific and academic environments
Background and Aim: The purpose of this paper is to study, identifying and discuss the foundation and concepts, models and frameworks, dimensions and challenges of research data curation and management in scientific and academic environments. Method: This article is a review article and library method was used to collect scientific and research texts in this field. In this research, external an...
متن کاملSemantic Research Platform for Model Organisms
Model organisms such as budding yeast provide a common platform to interrogate and understand cellular and physiological processes. Knowledge about model organisms, whether generated during the course of scientific investigations, or extracted from published articles, are integrated and made available by model organism databases (MODs) such as the Saccharomyces Genome Database (SGD). SGD uses I...
متن کاملExpertise Modelling in Community-driven Knowledge Curation Platforms
Expertise modelling has been the subject of extensive research in two main disciplines Information Retrieval (IR) and Social Network Analysis (SNA). Both IR and SNA techniques build the expertise model through a document-centric approach providing a macro-perspective on the knowledge emerging from large corpus of static documents. With the emergence of the Web of Data, there has been a signific...
متن کاملSelf Training Wrapper Induction with Linked Data
This work explores the usage of Linked Data for Web scale Information Extraction, with focus on the task of Wrapper Induction. We show how to effectively use Linked Data to automatically generate training material and build a self-trained Wrapper Induction method. Experiments on a publicly available dataset demonstrate that for covered domains, our method can achieve F measure of 0.85, which is...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017